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DEVELOPMENT AND APPLICATION OF A 
GRADIENT METHOD FOR SOLVING 
DIFFERENTIAL GAMES* 

By David A. Roberts and Raymond C. Montgomery 
Langley Research Center 

SUMMARY 

A gradient technique for solving n-dimensional differential games is developed 
in this paper and applied to two example games. The method requires selecting nomi- 
nal controls for both players and then improving these controls iteratively until the 
optimal controls are found. An iteration scheme is recommended, consisting of a mini- 
mization phase employing only steepest descent followed by a phase employing alter- 
nately steepest descent and steepest ascent. The gradient method has been applied 
to two pursuit- evasion games: the first, a two-dimensional game similar to the homi- 
cidal chauffeur but modified to resemble an airplane-helicopter engagement; and the 
second, a five- dimensional game of two airplanes at constant altitude and with thrust 
and turning controls. In both games, the performance function to be optimized by the 
pursuer and evader was the distance between the evader and a given target point in front 
of the pursuer. 

The analytic solution to the first game is found and compared with the gradient solu- 
tion. The outcome of the gradient method is strongly dependent on the nominal controls 
selected in some cases. However, the analytic solution reveals the existence of both 
unique and nonunique solutions, depending on the initial conditions selected. A compari- 
son between the gradient results and the analytic solution shows that the dependence on the 
nominal controls occurs only in regions where nonunique solutions exist. In the unique 
solution region, the results from the two methods agree very closely. 

The application of the gradient method to the five- dimensional two-airplane game is 
illustrated for one set of initial conditions. These results are also shown to be dependent 
on the nominal controls selected and indicate that these initial conditions are in a region 
of nonunique solutions. 

*Part of the information presented in this paper was included in a thesis entitled 
"The Application of Gradient Methods to Differential Games" submitted by David A. 
Roberts in partial fulfillment of the requirements for the degree of Master of Science, 
George Washington University, May 1971. 



INTRODUCTION 


Analytic methods useful in studying pursuit- evasion are principally derived from 
game theory. A game can be considered as a multiple-decision process in which at least 
two players take part and, in some manner, have conflicting objectives. A game has 
three basic features: a starting condition, a set of rules governing the evolution of the 
play, and a termination condition. In a pursuit- evasion game involving two airplanes, 
the airplane equations of motion and the controls available to the pilots define the rules 
of the game. A game in which these rules are stated as differential equations is called 
a "differential game." Obviously, each player wishes to select his controls to "play" in 
the best manner, that is to achieve some objective. In a differential game, this objective 
or "payoff" is usually formed as a quantitative measure of how well the game is played. 
Note that the pursuer could have a payoff different from that of the evader. The objective 
then of each player is to select the controls that optimize his payoff. 

To determine the optimal strategies for the players, one must select a rationale for 
their actions. An example is to select a set of evasive maneuvers, either open- loop or 
closed-loop, and to determine pursuer maneuvers that lead to optimal pursuer payoff. 

But the difficulty with this approach is that the optimal pursuer maneuvers depend strongly 
on the evasive maneuvers selected. Another improved rationale is to assume that each 
player selects a strategy based on receiving optimal payoff for himself while assuming 
that his opponent will accordingly select a strategy to optimize his payoff. If, further- 
more, the payoff for the two opponents always adds identically to zero, the game is called 
a zero-sum differential game because one player attempts to maximize the payoff while 
the other attempts to minimize the same quantity P. 

The purpose of this paper is to develop and apply a gradient method for solving 
zero-sum differential games. Present solution methods rely heavily on indirect or ana- 
lytic methods similar to those originated by Isaacs. (See ref. 1.) However, these 
methods are currently restricted to problems where the governing equations are linear 
or contain simple nonlinearities and to problems of low state dimensions. Direct 
methods that have proven to be successful in optimization theory (ref. 2) have not gen- 
erally been used to solve games. However, Taylor (ref. 3) has applied Balakrishnan's 
Epsilon technique to a pursuit- evasion problem. His method has been applied to prob- 
lems in which the payoff is the time required to reach a specified position regardless of 
orientation. Recently, Baron et al. (ref. 4) have developed a direct method which is global 
in nature (that is, the solution is found for all initial conditions). However, the computer 
storage required to attain this global nature limits the method's applicability. The gra- 
dient method described in this report numerically determines the optimal controls by 
iterating on some nominal set of controls. Since this method is not limited by the 
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restrictions discussed, it may readily be applied to more realistic differential games. 
To insure convergence, this method is limited to games in which either the minimum or 
the maximum of the payoff can first be found for a fixed action on the part of the other 
player. Then, once this optimization problem is solved, both players can be permitted 
to optimize to find the min-max or max-min solution. For large dimensional problems, 
the saddle-point solution may not exist; that is, the min-max solution and the max-min 
solution may not be the same. The iteration scheme as presented in the appendix will 
insure convergence only to a max-min solution. However, the solutions presented in 
this paper are believed to be also saddle-point solutions. The terms "saddle point," 
"min-max," and "max-min" will then be used interchangeably to describe the solutions 
with the understanding that they may not be the same in more complicated problems. 

SYMBOLS 


a n normal acceleration, g 

C terminal manifold 

Cj,C2 constant weightings in payoff 

CD,i induced drag coefficient 

Cd,o drag coefficient at zero angle of attack 

f function for equations of motion 

g gravitational acceleration constant, 9.8 m/sec 2 

H Hamiltonian defined in equations (20) 

I n x n identity matrix 

J(x) scalar termination function 

K u pursuer gradient gain 

Ky evader gradient gain 

m airplane mass, kg 
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dimension of system in state space 


scalar payoff 


range of target point, m 


minimum turning radius, m 


distance from target point, m 


airplane wing area, m^ 


parameters of playing space 


airplane thrust, N 


forward time, independent variable 


pursuer control vector, m x l 


airplane velocity, m/sec; Value of game, min-max payoff 


Value derivatives, — — 
sx i 


evader control vector, m x 1 


pursuer’s angular velocity 


pursuer's velocity, m/sec 


evader's velocity, m/sec 


axes for inertial coordinates 


state vector, m x 1 


angular orientation of players 


atmospheric density, kg/m^ 
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a independent variable of steepest descent 

Act increment in a 

t variable of integration, or retrograde time, sec 

$ n X n matrix defined by equation (15) 

<p pursuer's turning control 

^ n x n transition matrix defined by equation (11) 

^ velocity angle control for evader, rad 

V x f,A n x n gradient matrix, A^j = 

V u f,B n x m pursuer control gradient matrix, ; 

V y f,C n x m evader control gradient matrix, Cjj = 

V 1 x n row vector whose ith element is 

x 8xi 

VT P 

1 x n row vector whose ith element is 
x S Xi 

Subscripts: 

A airplane A 

B airplane B 

f evaluation at terminal time 

i,j ith and jth components 

nom nominal controls or path 

o initial value 
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Superscripts: 


T transpose of vector or matrix 

* optimal controls 

Vectors are denoted by arrows; quantities without arrows are scalar. A dot over 
a symbol denotes differentiation with respect to time. The symbol = means "is defined 
as" and the symbol = denotes identically equal terms. 

PROBLEM STATEMENT 

Generally, for any two airplanes the equations governing the evolution of an 
engagement involve a set of quantities x — referred to as the state — which describe 
the positions and velocities of the vehicles, and a set of quantities u and v - referred 
to as the controls — which describe the way in which the individual aircraft may affect 
the evolution of the engagement. 

The first step in determining the relative superiority of two opposing airplanes 
is to decide, for each, how much value will be attached to being in a certain state — a 
certain position relative to the opponent. Indeed, if the method of gradients is to be 
used, the weighting should be determined so that a single number P is associated 
with each state x. This function P(x) is referred to as the "payoff" and is ordered 
so that P(x) = 0 corresponds to ideal capture and P(x) — 00 corresponds to escape 
for the evader. The numerical value of the function P for an airplane (designated A) 
thus provides a measure of how "far" the airplane (designated B) is from being placed 
in an unfavorable situation. This ordering of the situation is arbitrary. In a general 
game, the cost structure P^(x) for airplane A may differ from that of airplane B 
P B (x). The analysis presented in this report is restricted to cases where 
P A (x) = P(x) = -P B (x), that is, to zero-sum differential games. 

To illustrate these concepts, consider a simplified model of pursuit- evasion, which 
is restricted to maneuvers in a horizontal plane. This particular model has eight state 
variables made up of the absolute position coordinates and heading of both vehicles and 
control variables such as the accelerations normal to and along the flight path of each 
vehicle. Figure 1 illustrates the geometry and shows the state variables for each air- 
plane. A typical example of the payoff for airplane A is presented in figure 2. In this 
figure the coordinates represent the relative position of airplane B in the horizontal 
plane with the Xj-axis directed along the velocity of airplane A. Contours for constant 
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values of the function P(x) are indicated in the figure. For this example, the most 
favorable situation for the pursuer was for the evader to be at the pursuer's target point 
where P = 0. 

To provide an analytic statement of the problem considered in this study, the 
governing differential equations of motion are given in the form 

x= f(x,u,v) (1) 

where 

x an n x 1 state vector 

u an m x 1 control vector used by pursuer 

v an m x 1 control vector used by evader 

Admissible engagements will be those which satisfy a given initial condition 

x(t 0 ) = x 0 (2) 

a given termination condition 

j(x(t f )) = 0 (3) 

and the governing differential equations (1) for a given set of controls u(t) and v(t) 
defined on the time interval t Q ^ t S tf. The payoff of an admissible engagement is 

p(u,v) = P(x(tf)) (4) 

Admissible control functions u and v which satisfy equations (1) to (4) are said to be 
of class C. The problem considered in this paper is to find a set of admissible controls 
(u* and v*) satisfying 

u*(v) = arg min P(u, v) 

(u,v) e C 

v* = arg max p(ii*(v), v 
(u*,v) e C 
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where arg min and arg max indicate the functions that minimize or maximize the 
payoff, respectively. The max-min solution to the differential game is termed p(u*,v*). 

GENERAL THEORY 

The basic approach of the gradient method in optimization theory is described in 
references 2 and 5. The same general concepts are used in this paper to obtain solutions 
of differential games (that is, finding controls u* and v* which satisfy equations (5)). 
Figure 3 illustrates the gradient approach to solving this problem. The basic idea is to 
start with a nominal engagement between opponents and to modify this engagement itera- 
tively to reach the desired solution. The procedure for modifying the engagements from 
one iteration to another is based on linearized variational equations. 

To obtain the gradient formulas for modifying the controls, it is convenient to 
describe each step in the iterative process by an additional independent variable a which 
may be thought of as an iteration number. At a = 0, the state and control functions are 
those of the nominal engagement and as a — 00 , the state and control functions should 
approach the optimal solution sought. Then 

u(t,a)| a=0 = u nom (t) 

? Ma=0 s 7 nom (t) 
and the optimal controls correspond to 

U*(t) s u (t,CTf) 

v*(t) = V (t,CTf) 


The iterative process for a typical control as a function of t and a is depicted 
in figure 4. The nominal control is at the intersection of the u,t plane and the a-axis. 
Notice that as a increases, u(t,a) gradually converges to a function considered to be 
the optimal. Note also that the final time is not necessarily constant but is instead a 
function tj(a). 

The kinematic equations (eqs. (1)) can be rewritten in the form 

(t,<x) = r(x(t,a),u(t,a),v(t,a)) (6) 


with equation (2) 


For a given u(t,a) and v(t,cr), equations (6) and (7) may be used to determine x(t,cr). 
The termination time tj(cr) is determined from equation (3) written as 

J (x(tf,crj) = 0 (8) 

If the controls u and v are modified according to the linearized relations 

u (t, cx+Aa) = u(t,a) + |f (t,o) Act 

v(t, ct+Act) = v(t,cr) + |f (t,a) Act 
the variation in x will approximately satisfy 

x(t, ct+Act) ~ x(t,CT) + |f (t,a) Act 

The function gf (t,cr) can be obtained by formally differentiating equation (6) with respect 
to cr and interchanging the order of the t and cr differentiation to obtain the linear 
equation 

m if = Alt.'’) if + |f ♦ cm |f O) 

where 

A(t,(r) 4 V x f(x(t,CT),u(t,CT), v(t,o-)) 

B(t,<r) 4 V u f(x(t,CT),u(t,cr),v(t,CT)) 

C(tjCT) 4 V v f(x(t,CT),u(t,CT),v(t,CT)) 

Equation (9) is a linear differential equation which can be solved for |f in terms 
of ^ and Since these last two variables are needed to update the controls, the 

solution to equation (9) will be used to find expressions for |f and ~ that will gen- 
erate the optimal controls. It is known that the solution to equation (9) can be written as 
(see ref. 6) 

|f = J to * T (r>t,CT) B(r,or)|f (t,ct) dr 

+ it Q * T(T,t,a) C(t,ct) ( t ’ ct ) dT + ^o,^) |f (t 0 > a ) (10) 
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where #(r,t,a) is a matrix defined by the adjoint differential equation 

Jf *(t,t 0 ,<x)= -A T (t,a) *(t,t 0 ,a) (11) 

and with the additional properties that 


*(Wo> a ) = 1 

*(t,t 0> o) = 1 ® r " 1 ( t o» t ’ cr ) 

*(t,tf,a) = *(t,to,o) *(t 0 ,tf,<r) 

These latter properties will permit the transition matrix to be integrated numerically 
and simultaneously with the equations of motion in forward time. 

The initial condition of equation (7) will be constant from iteration to iteration. 

Hence, the initial condition on needed to integrate equation (9) is 

0M-« 


Since a terminal payoff is being considered, the variation in the state must be evalu- 
ated at t = tj. The total variation in the terminal state Xf 4 x(tf(a),o^ can be calcu- 
lated from the expression 


dxj 

da 


_ 9x t. \ 9x l _\ 

_ da ( tf,a ) + 9t ( tf> ) da 


which results in (from eqs. (6) and (10)) 

^ = It 4 1 * ,T ( T ’ t f }C7 ) l? (T ’ a) ^ (T,<7) + C(T,<T) dT 


dtf 


+ T(x(t f ,a),u(t f ,a),v(t f ,a))— 


(12) 


dtf 


If a fixed duration game is being considered, will be zero. However, for 

dtf dcr 

variable termination time, — ^ must be evaluated from the termination condition (eq. (8)). 

da 

J (*( tf,o D = 0 

Differentiation of equation (8) yields 


~ = V T J('x f N )— = 0 
da x ( f ) da 


(13) 
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dt f 

The term — - may now be eliminated between equations (12) and (13) and an expression 

dxr ^0 dtf 

for — 7 — can be obtained independent of — . This result is 
do da 

S =, 4.(3,, »„? f J b(t,<j) g + C(r,a) g(r,a)| dr^ (14) 


where 


where 


V X J P f ) ^XfjUf.Vf) 


Uf ^ u (t f (a),a) 
v f A v (tf(cr),a) 


(15) 


The total derivative of the payoff P with respect to a can be determined by 
using equation (14) and noting that 

'-r\ 




Hence, 


dP 

da 


- vjp (x f ) $ (x f ,uf,v f ) Aj f * T (-r,t f ,a) 


B(r,a) §§( T > CT ) + c ( T > a ) ||( t ,ct)| dT^ 


r,A| 


(16) 


Equation (16) gives the "slope" of the payoff with respect to the iteration variable a. It 
is possible to choose expressions for and from equation (16) that yield the most 

j-Q 

negative and most positive values of • From reference 2, the steepest descent direc- 
tion (used for minimizing the payoff with respect to the control u) is 


-B T (t,a) *(t,t f ,a) $ T (x f ,uf,v f ) V x P(x f ) 

and the steepest ascent direction (used for maximizing the payoff with respect to the 
control v) is 
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+C T (t,a) *(t,t f ,cr) $ T (xf,uf,vf) V x P(x f ) 


The control variations are now chosen as 

-\ 

§§ = -KuBT^ct) ^(t,t £ ,CT) $ T (xf,Uf,v f ) V x P(x f ) 

) 

= +K v C T (t , ct) *(t,tf , ct) $ T (xf,u f ,v f ) V x p(x f )^ 


( 17 ) 


where K u and K v are positive scalar constants whose selection is explained in the 
appendix. The control modifications, where the controls are now updated by using the 
first-order approximation, are of the form 

u(t, ct+Act) = u (t,cr) + Aa 
v(t, ct+Act) = v(t,a) + ^ Act 

The iterative process is continued until and approach zero and no improvement 

is observed in the payoff. Notice that saturation may occur if limits are placed on u(t,CT) 
and v(t,CT). In that case the system will be converged when the controls saturate, even 
though the control derivatives do not approach zero. Details of the iteration sequence to 
insure convergence are found in the appendix, as are suggestions for handling the variable 
terminal time. 


APPLICATION TO A SIMPLIFIED DIFFERENTIAL GAME 

Analytic Solution 

Early in the present study it became evident that the sheer size of problems 
involving two airplanes inhibited a clear understanding of certain problem areas. Such 
problems as the type of solution or whether the gradient method would converge to a 
solution could not be readily investigated in the high-dimensional states. Also it became 
apparent that the gradient method could be checked by a comparison with some analytic 
results. For these reasons, a simplified game of lower dimensions was solved; however, 
the game still embodied the basic concepts of the higher dimensional games. 

The geometry of the simplified game is shown in figure 5 for an inertial coordinate 
system. The pursuer A moves with constant velocity wj and controls his radius of 
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turn. The evader B moving with a different constant velocity W2 controls his velocity 
direction xp and is free to change it instantaneously. The equations of motion are 


x A = wj sin 6 

y A = wj cos 6 

e = -±4> 

K c 

X B = W 2 cos (9 - xf/) 
y B = w 2 sin ( 6 - xp) 




(18) 


To reduce the order of the system, a relative coordinate system is defined as illus- 
trated in figure 6. The origin of the relative system is centered in and moving with the 
pursuer so that the pursuer's velocity is always along the positive xj-axis. Notice that 
two state variables are now adequate to describe the system completely. The equations 
of motion are the same as those given by Isaacs in reference 1 for the homicidal chauffuer 
game 

Xj = w0x 2 + w 2 cos xp - 
±2 = -w^Xj + w 2 sin xp 



where 

W 1 

the pursuer's turning rate control is 
-1 S p g +1 

and the evader's direction control is 

-7T S ip 5 it 

The payoff will be terminal and is taken as the distance r from the target point 
(R,0) at the terminal time. Only one pass will be permitted, the game terminating at the 
closest approach to the target point for each run. It is further stipulated that r < 0 

throughout the region of play. To simplify computation, the payoff may be redefined 
2 

as P = and J = P = 0 can be taken as the termination condition. The problem now 
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is to determine controls <p and 4 such that P^x(tf)) is a minimum with respect 
to c p and a maximum with respect to 4 with r = 0 and with tf determined by the 
satisfaction of 


J = [jw 0 Rx 2 + (xj - R)w 2 cos 4 + W 2 X 2 sin 4 - wi(xj - R)| = 0 

The analytical treatment follows Isaacs' work in reference 1. Isaacs first sets up 
the "main equation” similar to the Hamilton-Jacobi equation of the calculus of variations 


min max H = 0 

cj> \p 


H 


■ i v, 


( 20 ) 


i=l 


J 


dV 


where V* = and V is the Value of the game defined as V = min max P. Expan- 

1 Sx i <P * 

sion of equations ( 20 ) yields 


min max|w<£ - V 2 X-^ + W 2 (Vj cos 4 + V 2 sin 4) - V^w^J = 0 ( 21 ) 


<p 4 

Optimal controls 4>* and 4* are selected as 


4>* = -sgn A 


where 


A = V x x 2 - V 2 xj 

. .* v 2 

sin 4 * = — 

r 

,* v x 

cos 4 * = - 7 - 

r 


where 


p - Jvj* + V 2 2 


( 22 ) 
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Substituting the optimal controls into equation (21) produces the second form of the main 
equation 


H = wcp*A + W£ P - WjVj = 0 


Introducing next the concept of retrograde time r such that — = - — and using 

the equation = I™, the retrograde path equations (RPE) become 
dT ° x i 


dxi 

-j-i = -w<p*X2 " w 2 cos x f / * + W 1 
dxo 

- W2 sin ip* 

dV, 

-=-w4>*V 2 

dVo 

-ST J 


(23) 


in order to integrate the RPE, the terminal conditions must be evaluated. The ter- 
minal manifold here is the locus of points where J = 0 for both players playing optimally. 
If parameters S3 and S2 are introduced such that 


Xj = R + S3 cos S2 
Xg = S3 sin S 2 


(24) 


then S 2 and S3 may be found from 


min max cos (S 2 - 'P) - Wj cos S2 - Rw0 sin Sjjj^ = 0 

Choosing \p* = S2 and <p* = sgn (sin Sg) leads to S3 = 0 and 


So = sin 


-I. 


w 2 


(Jwi 2 + R 2 w 2 y 


- e 


where 


6 = sin -1 | 


W 1 


,/wj 2 + R 2 w 2 y 


= cos" 


Rw 


yywj 2 + R 2 w 2 y 
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Notice that S2 is independent of S3 and constant for a given problem since 
W2, Wj, w, and R are specified constants. Since S2 is constant, the* terminal mani- 
fold is a line originating at the target point (S3 = 0) and extending outward to infinity on 
both sides of the xj-axis; therefore, the solution need be found only for the right half 
plane. 

Figure 7 shows the terminal lines for various ratios of W2 to wj, when • 

R/R c = 0 . 6 . The lowest line, where v^/wj ~ corresponds to the case of an immobile 
evader. As W2 increases, the play becomes more favorable to the evader. The termi- 
nal line moves upward reducing the region in which pursuer can capture evader. At 
w 2 = w l> th e evader can force r to zero by flying directly away from the pursuer; hence, 
termination can also occur on the xj-axis. These two terminal lines then converge to one 
line again at 

W2 = ^wj2 + r 2 w 2 


For 


w 2 > pi 2 + r 2 w 2 

no solution exists; thus, the evader can now everywhere force r > 0 and escape. 

It has been observed that the game terminates whenever the evader is forced across 
the terminal line. But it may well be that the optimal paths actually terminate on only a 
part of the terminal line. The part on which termination may occur is called the "usable 
part of the terminal manifold." The boundary of the usable part will delineate between 
the two regions in which one player or the other has control over termination. 

Consider now the situation when the evader is an infinitesimal distance above the 
terminal line C. The evader wishes to force termination to prevent his payoff from 
worsening further and hence wishes to force the component of x normal to C to be 
downward or negative. Likewise, the pursuer wishes to prevent termination so that his 
payoff will improve. The boundary of the usable part will then be just the point where the 
normal component to the terminal line is zero under optimal play. Expressed mathemati- 
cally, this condition becomes 

min max (xj sin S 2 - X2 cos 83^ = 0 

Notice the reversal of the player's ordinary minimizing and maximizing roles. Substi- 
tuting the equations of motion and solving for S3 yields 
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(25) 


S 3 


W£ + Wj sin Sg - wR cos Sg 
w 


Equation (25) defines the boundary of the usable part as a point on each of the terminal 
lines. Figure 8 shows this boundary as a function of Wg for each of the terminal lines 
presented in figure 7. The terminal line is usable for any value of S3 less than the 
value given in equation (25) and nonusable for any larger value of S3. The optimal path 
equations (eqs. (23)) may then be integrated back through the playing space from points on 
the usable part of the terminal line. 

Before integrating the equations, the variables and optimal controls must be eval- 
uated on the terminal line. The Value /min max P\ of the game is given by S3. From 

\ 0 ^ / 

Isaacs (ref. 1, ch. 4) 


3 V _ V 3V _ V y . 9x i 
3Sj£ Lj 3Xj 9Sj£ 1-j * 3Sj£ 


(K = 2,3) (26) 


where Xj is given as a function of S2 and 
tions generated by equation (26) for Vj and 


S3 in equations (24). Solving the two equa- 
Vg yields 


Vj = cos Sg 
Vg = sin Sg 

on the terminal manifold. With Vj and V2 determined, the optimal controls may also 
be found as 

= Sg 

<(>* = sgn (sin S2) 


The retrograde equations and initial (retrograde) conditions now become 


dxi Vi 

— = -wx 2 - w 2 — + w x 

dxo Vg 

— = WX! - w 2 — 

dVi 

_1 = -wVg 
dVo 

“d7 = wVl 


xj(0) = R + S3 cos Sg 
x 2 (0) = S3 sin Sg 

Vj(0) = cos Sg 

V 2 (0) = sin Sg 




(27) 
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These equations may be integrated to yield 


Vj = cos (S2 + wrj 
V2 = sin (S2 + wrj 


Xj = R cos WT + (S3 - W2T) cos ^2 + WT ) + ^ sin wr 

Wi 

X2 = R sin wt + (S3 - W2r) sin (S2 + wt) + -^ (1 - cos wr) 


The complete solution is shown in figure 9 for the case W2 = 0.2, wj = 0. 1, R = 3, 
and w = 0.2. Remember that since the coordinate system is centered in the pursuer and 
moving with it, the optimal paths show the motion of the evader relative to the pursuer. 
The analytic solution divides the relative space into three areas with each area repre- 
senting a type of solution for starting points within that area. The shaded region is a ter- 
mination zone where the player that desires termination may force it to occur. In this 
zone the evader can always force r > 0; hence, the game will terminate instantly and no 
optimal paths will result. The terminal line which separates the shaded region from the 
rest of the playing space was calculated from the expression 


min max J = 0 

4 > $ 

and the points in the shaded region then satisfy 
min max J > 0 

4 t > 

The unshaded region of the playing space can be divided into two areas: one where 
nonunique solutions occur and one where unique solutions occur. Immediately above the 
terminal line is an area in which unique optimal solution for both players can be found. 

The paths represent optimal paths for any set of initial conditions in the region. These 
optimal paths will terminate on the crosshatched area representing the usable part of the 
terminal manifold. For example, for a set of initial conditions at point A, the solution 
is for the state to follow the path indicated until the game ends at point B. The pursuer 
is required to use a saturated control (full turn) but the evader is, nevertheless, able to 
prevent the payoff from reaching zero. The evader chooses his optimal strategy in accor- 
dance with equations (22) and (28). Notice that the evader will be pointed directly away 
from the target point along the terminal line at termination. If the evader would fly any 
other strategy, the pursuer would gain. Likewise, if the pursuer does not use saturated 
controls, the evader will gain. 
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The boundary of the unique region is the unique solution along which the payoff will 
be zero. That is, the pursuer can only force the evader to the target point by turning full 
right. For points above this path, the pursuer no longer needs to use saturated controls 
to obtain zero payoff; hence, the solutions become nonunique. The payoff for all initial 
conditions in this region is the same, namely, zero. Any trajectory that brings the evader 
to the target point is optimal. Because the initial conditions lie within the region that con- 
tains nonunique solutions, it is possible to arrive at the same terminal state by many 
different paths. Consequently, different nominal paths will give different converged solu- 
tions but will yield the same payoff. 

Gradient Solution to the Simplified Differential Game 

The gradient method was applied to the problem just solved by using the procedure 
described in the appendix. These results are shown in figures 10, 11, and 12. 

Each calculation of the gradients and is considered to be one iteration; 
however, each iteration may have from one to ten minor cycles. (See the appendix.) As 
a result, one must be very careful in comparing iteration times and number of iterations 
for different cases. In figure 10 an initial condition is chosen in the unique trajectory 
region. The nominal path, an intermediate nonoptimal path, and the final converged solu- 
tion are presented. A comparison with figure 9 shows the extremely good agreement 
with the optimal path found analytically. The method required 41 iterations to converge 
and averaged some 9 seconds per iteration using a Control Data 6600 computer. 

Figures 11 and 12 illustrate the solutions in the nonunique region for different nomi- 
nal paths. The gradient method required about 5 seconds per iteration for these cases. 
Figure 11(a) shows the nominal path (straight flight for both players) and iterated paths 
leading to a converged solution after seven iterations. Since this converged path is known 
to be in the nonunique region, a new game with unique solutions can be formed for this 
region. The payoff for this game is the time to reach the target point. The evader still 
maximizes and the pursuer minimizes this payoff with termination remaining r = 0. 

This time- optimal game may be solved by using the same approach presented in the gen- 
eral theory above. One simply increases the dimension of the state vector by one with 

being time and x^j = 1. A typical solution for this time and position optimal 
game is shown in figure 11(b) by using the optimal solution from figure 11(a) as the 
starting point and iterating for 64 iterations. Notice that the pursuer now initiates a full 
turn followed by straight flight to bring the evader down the xj-axis to the target point. 
This type of maneuver is found to be the solution to the time-optimal game in reference 1. 

Figure 12 shows the results when a different nominal trajectory is used with the 
same initial conditions as in figure 11. In figure 12(a), the nominal path results from 
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straight flight by the evader = 0) and a full right turn by the pursuer (<p = +1). The 
resulting position optimal path after eight iterations is quite different from that in fig- 
ure 11(a) although the payoff is the same, a fact which further verifies the nonuniqueness 
of this region. But reverting to the time- optimal game should yield unique results regard- 
less of the nominal path chosen. A comparison of figures 11(b) and 12(b) show reasonable 
agreement in the solutions. For both nominal paths the time-optimal strategy is shown 
to be a full turn followed by straight flight to the target point. 


Application to a Two-Airplane Game 


As another example of the application of the gradient method, the results are pre- 
sented for the two-airplane problem discussed in the section entitled "Problem Statement." 
The equations of motion are written for an engagement at constant altitude. To simplify 
computation, a coordinate system centered in and moving with the attacker is used, the xj 
coordinate being measured along the pursuer’s velocity vector. 


The relative position equations are given by 


xi = 
*2 = 


^p-V A + V B cos % 


-u-a, 


1 n > AgX1 + V A Sin % 

V A A B 




(29) 


The relative angle equation is 


. gTja^a P>lVA 

V B ' V A 


The speed equations for the two airplanes are 



where Cj) j0 is the drag coefficient at zero angle of attack and Cj)^ is a coefficient 
representing induced drag resulting from the turn. 
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Since a nj A and a n,B are maximum turning accelerations and T A and T B 
are maximum thrusts, each airplane has two controls with the following constraints: 

(Full left turn) -1 = = +1 (Full right turn) turning control 

0 ^ U 2 ,V 2 = +1 (Full thrust) thrust control 

The aerodynamic variables p, S, Cj) >0 , Cp)^, T, and a n are chosen for a 
particular flight condition and then held constant throughout the engagement. For the 
results presented here, the airplanes are considered to have the same aerodynamic char- 
acteristics — those of a fighter airplane operating at 9144 meters altitude. The drag 

Cr) nP® rt Cn ipS _ 

characteristics are — ^ » 4.968 X 10" '/ me ter and — ^ = 8.29 x 10"'/ me ter. 

2m 2m 

Maximum thrust per unit mass is 0.03581 m/sec^ and maximum turning acceleration 
is 29.44 m/sec^ (limited to a 3g turn). To prevent a standoff situation, airplane A 
is given an initial velocity advantage over airplane B. These initial conditions 
are = 230 m/sec, Vg = 190.5 m/sec, % = 90°, X£ = 4115 meters, 
and xj = 7315 meters. 

To formulate the problem as a game, let one player A be the pursuer and the other 
player B, the evader. The payoff, as in the previous game, will be the distance from 
a target point (x2=0,xi=R) directly in front of the pursuer and where R = 915 meters. 

The pursuer will attempt to minimize the distance and force P = 0 while the evader 
tries to maximize the distance. For this game the contours of constant payoff are 
ellipses centered around the pursuer’s target point as illustrated in figure 2. Thus, 

P = CjX 2 ^ + C 2 (xj - r) 2 where the constants were chosen as Cj = 16.8 x 10"® 
and C 2 = 10.76 x 10"® representing the boundaries of a capture zone for a typical 
fighter airplane. As in the previous example, only one pass will be considered, the 
game terminating at the first minimum of the payoff. 

The gradient- method results for a lateral pursuit case using two different nomi- 
nal controls are shown in figures 13 and 14. For visualization purposes the results 
are presented in absolute coordinates with the paths marked at 5-second intervals. 

The dashed lines indicate the nominal trajectories for each airplane whereas the solid 
lines are the converged optimal paths. The nominal controls in figure 13 shows the 
pursuer and evader flying straight at intermediate thrust and with the pursuer turning 
right after 20 seconds. The converged path after 28 iterations has the evader turning 
full right at minimum thrust and the pursuer using a combination of turns at minimum 
thrust to capture. Figure 14 shows the same pursuer nominal with the evader nominally 
turning left and then proceeding in straight flight. After 47 iterations the converged 
trajectory shows the evader using a full left turn to meet the pursuer head on. Both 
airplanes again use minimum thrust. 
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These two results are considered examples of nonunique solutions since widely 
differing strategies yield the same payoff. The payoff for figure 13 is 6.15 x 10“4 and 
for figure 14 is 2.81 x 10"4. When compared with an initial payoff of 725.7, these two 
solutions are extremely close. Notice that since the payoff was independent of the rela- 
tive angle, sizable angular differences could occur without changing the payoff. In both 
of these solutions, the converged evader strategy is a maximum turn at minimum thrust. 
Nevertheless, the pursuer's velocity advantage enables him to close in on the evader to 
the desired range. 

The one-pass restriction (P = 0) in the games considered in this paper is more 
important than may appear at first glance. In the simplified game solved earlier, this 
restriction generated the line on which termination could occur. The effects there 
could readily be seen in that points where a solution might not occur were eliminated. 
However, in this larger problem the effect is not so easily visualized. One effect is to 
prevent any type of swerve maneuver such as in the homicidal chauffeur problem of 
reference 1 in which the pursuer initially turns away and then swings back to capture. 
The airplanes here are forced to continually close in on one another, or nearly so. 
Depending, of course, on the initial conditions, one can reasonably argue that the pur- 
suer can always gain if this one-pass restriction is lifted. However, the authors 
believe this restraint is not unrealistic but often representative of actual pursuit 
evasion. 


CONCLUDING REMARKS 

A gradient method has been presented in this paper which may be applied to 
solving general zero-sum differential games. This method, a first step in developing 
a computational capability in game theory, is applicable to nonlinear multidimensional 
game problems representing realistic combat between two airplanes. Problems of 
this magnitude previously could not be solved. The technique requires selecting nominal 
controls which are then improved iteratively by a scheme consisting of a minimization 
phase followed by a minimizing- maximizing (min-max) phase. 

The analytic solution to a simple differential game which is analogous to the 
particular aerial combat problem studied has been presented to give insight into the 
nature of the higher dimensional solution. This analytic solution revealed that different 
nominal engagements often produce different final solutions for the same initial condi- 
tions. This condition occurs because there is a region of space in which nonunique 
answers exist. Computational results have been presented that illustrate this behavior 
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for the problem solved analytically. The method was then extended to a larger more 
realistic problem of two airplanes at constant altitude which also was found to exhibit this 
nonunique feature. 

The question of types of solutions to large-dimensional games is a difficult one. 
When no analytic solution is available, it is often difficult to determine whether solutions 
are min-max, max-min, saddle points, nonunique, or perhaps local solutions. Never- 
theless, any method which gives a solution to these higher order problems could be a 
valuable tool for studying such things as the effects of aircraft performance parameters 
on the outcome of an aerial combat engagement. 

Langley Research Center, 

National Aeronautics and Space Administration, 

Hampton, Va., September 13, 1971. 
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APPENDIX 


COMPUTATIONAL ASPECTS 

The theory described in the section entitled "General Theory" has been implemented 
on the Control Data 6600 digital computer at the Langley Research Center in a FORTRAN 
program. Figure 15 shows a diagram of the steps presented below. The method starts 
with a nominal control table and then improves this estimate by updating the control table 
to move in the steepest descent-ascent direction. The method has evolved into the fol- 
lowing format: 

(a) Select a nominal control table as a function of time for each player. (Use of the 
digital computer requires the control function to be discretized. Intervals of 1.0 and 

0.5 second have generally been used.) 

(b) Integrate the equations of motion from the given initial conditions by using the 
control tables from step (a) . Integrate simultaneously the transition matrix with initial 
conditions ^(t 0 ,t 0 ,a) = I and store the products B T (r,a) 'J'(r,t 0 ,a) and C T (r,a) ’k (r,t 0 ,(j) 
for each control table time point. The integration is stopped when the termination con- 
dition is satisfied. 

(c) Calculate the control derivatives from equation (17) and update the controls at 
each point of the control table. 

(d) Determine the best step size by changing K u for each K v to keep the payoff 
a minimum for each evader control change. Only the equations of motion will have to be 
integrated unless the termination time increases, in which case the transition matrix will 
also have to be integrated for the extended part of the table. To save computer time a 
large integration step size may be used here. 

(e) Once the final control step size is found, update the control table and repeat 
steps (b) to (e) until convergence is evident. Convergence will be characterized by first, 
a negligible change in the payoff from iteration to iteration, and second, the vanishing of 

the control derivatives and An automated criterion can be constructed by 

taking some suitable time average of and for example, the root-mean- square 
value, and stopping the iterations whenever the measure falls below some specified 
number. 

The first consideration in applying the formulas to a game involves the selection of 
an admissible nominal engagement. It is important to select nominal engagements in the 
vicinity of the global optimal solution. But this task becomes increasingly difficult as the 
game becomes more complicated since intuition and experience fail. In high- dimensional 
games and games with a complicated payoff, the payoff function may possess several local 
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APPENDIX - Continued 


optimal solutions. In gradient methods the iterations follow the direction of the "slope" 
of the payoff; therefore, the controls will converge to the saddle point in whose neighbor- 
hood the iterations are initiated. A solution is to repeat the iterative sequence with vari- 
ous nominal strategies to search for the global optimal. The problem is complicated even 
more since games with terminal payoffs usually possess regions in which nonunique solu- 
tions exist. Since the payoff is a function of the terminal states alone, the players are 
only concerned with the final value of the payoff function and not with the paths used to 
arrive at this terminal state. In this case different nominals will give different optimal 
trajectories but will converge to the same payoff. A solution again is to try different nom- 
inals and compare solutions to search for the nonunique region. Another alternative is to 
change to a time-optimal game by including a function of time in the payoff; this change 
results in unique optimal trajectories. 

One of the first questions asked about any iterative procedure is that of convergence. 
Under what conditions is convergence guaranteed? In applying the present gradient tech- 
nique to differential games, the iteration sequence is found to be of extreme importance in 
assuring convergence. For purposes of explanation, consider a scalar problem where a 
function P(u,v) is to be minimized with respect to u and maximized with respect to v. 
Figure 16 shows contours of constant P plotted in the u,v plane. The intersection of 

op op 

the two curves = 0 and = 0 represents the desired solution in function space. 

The iteration scheme proceeds as follows: First, select nominal controls (uq,vo\ 

9P 

and minimize P with respect to u keeping v at vg until = 0 is reached; 

second, permit v to be updated but then minimize P with respect to u for the new v. 

dP 

This procedure will force the iteration to follow the = 0 curve to the desired solution. 

Alternate iteration schemes may be used successfully in certain instances. For 
example, a sequence alternating minimization and maximization might be used. For the 
problem considered here, the solution to the minimization phase is known to exist from 
physical considerations for any v. However, the solution to a maximization phase at 
constant u does not generally exist. Thus, the sequence shown in figure 16 is the one 
which appears to be applicable to multidimensional problems since P can be chosen so 
that a definite minimum exists. But this sequence can guarantee only a max-min solution. 
A saddle-point solution may occur but is not insured. 

In the functional problems considered in this report, K u and K v are selected to 
follow the iteration scheme illustrated in figure 16. Generally, K u and K v are 

selected initially so that the control derivatives and are less than one-tenth of 

the maximum control range. During the minimization phase, K v is set to zero and 
iterations are performed only on the u control. During this phase, K u is adjusted to 
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APPENDIX - Concluded 


give the largest change in payoff for each evaluation of the gradients. Upon entrance to 
the min-max phase, both u and v are updated by using the initial and K v . 

Then K u is adjusted to give the minimum of the payoff by using the new updated v con- 
trol. That is, for each step in the u,v plane, the updating of u is adjusted to force 

9P 

the payoff back to the minimum curve where = 0. 

As indicated in step (d) of the computing procedures, the final time may vary signifi- 
cantly from iteration to iteration. This variable time format is handled through an extrap- 
olation and correction procedure in the variable-step-size routine. The procedure is 
illustrated in figure 17. Upon entrance to the routine, begin an iteration sequence to 
determine the best step size (that is, the best multiple or fraction of as discussed 

above. For Act = 0 let the terminal time be designated by tf } o* By using the updated 
control (Act = 0), the equations of motion (1) are now integrated, extrapolation on the con- 
trol being used if necessary. The integration is stopped when the termination condition 
is reached at some time tj If > tj g, then as soon as time exceeds tj g the 

transition matrix equations are integrated, and the control variations and are 

calculated for the extended part of the table. The control variations and are 
then added as a correction to the extrapolated control values when the controls are 
updated. Because the time extensions between cycles are usually not large, this pro- 
cedure helps to minimize any errors due to extrapolation. 
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Figure 1.- Geometry of a simplified pursuit-evasion model. 
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Figure 2.- An example ordering of the airplane positions for 


e A = e B and V A = V B . 
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Figure 3.- Path iterations in state space, 
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(a) Position-optimal game. 

Figure 11.- Gradient solution for the nonunique region using a straight flight nominal. 
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(b) Time-optimal game. 
Figure 12.- Concluded. 
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Figure 14.- Gradient results for the two-airplane game using 
a turning evader nominal. 




Figure 15.- Flow chart of the computing steps. 


















Figure 16.- Convergence procedure for scalar controls. 
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Figure 17.- Extrapolation and correction procedure for variable final time. 
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